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Let Xmxn be a rectangular data matrix with independent real- 
valued entries [xij] satisfying Kxij = and Exfj = jj, N, M ^ 02. 
These entries have a subexponential decay at the tails. We will be 
working in the regime N/M — d]v, lim]v->oo rfiv 7^ 0, 1, 00. In this paper 
we prove the edge universality of correlation matrices X, where the 
rectangular matrix X (called the standardized matrix) is obtained by 
normalizing each column of the data matrix X by its Euclidean norm. 
Our main result states that asymptotically the fc-point (fc > 1) corre- 
lation functions of the extreme eigenvalues (at both edges of the spec- 
trum) of the correlation matrix X'' X converge to those of the Gaussian 
correlation matrix, that is, Tracy-Widom law, and, thus, in particu- 
lar, the largest and the smallest eigenvalues oi X^ X after appropriate 
centering and rescaling converge to the Tracy-Widom distribution. 
The asymptotic distribution of extreme eigenvalues of the Gaussian 
correlation matrix has been worked out only recently. As a corollary 
of the main result in this paper, we also obtain that the extreme eigen- 
values of Gaussian correlation matrices are asymptotically distributed 
according to the Tracy-Widom law. The proof is based on the compar- 
ison of Green functions, but the key obstacle to be surmounted is the 
strong dependence of the entries of the correlation matrix. We achieve 
this via a novel argument which involves comparing the moments of 
product of the entries of the standardized data matrix to those of the 
raw data matrix. Our proof strategy may be extended for proving the 
edge universality of other random matrix ensembles with dependent 
entries and hence is of independent interest. 

1. Introduction. The aim of this paper is to prove the edge universahty 
of correlation matrices. The data matrix X = (xij) is an M x iV matrix with 
independent centered real-valued entries. The entries in each column j all 
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are assumed to be identically distributed: 

(1.1) Xij=M-^/\ij, Eqij = 0, ^qfj=<r], l<i<M. 

Furthermore, the entries qij have a subexponential decay, that is, there exists 
a constant t9 > such that for u > 1, 

(1.2) F{\q,j\ > uaj) < exp(-'u''). 
We will be working the regime 

(1.3) d = dN = N/M, lim (i/0,l,oo. 

Thus, without loss of generality, henceforth we will assume that for some 
small constant 9, for all N gN, 

9<dN<0~^ and 9<\dN-l\. 

Notice that all our constants may depend on 9 and "d, but we will subsume 
this dependence in the notation. 

For a Euclidean vector a E M*^, define the £2 norm 

The matrix X^X is the usual covariance matrix. The jth column of X is 
denoted by Scj . Define the matrix M x N matrix X = (xij ) 

(1.4) Xij ■.= Xij/\\Xj\\2. 

The (A^ X A^) matrix X^X is called the correlation matrix.'^ Using the iden- 
tity Exfj = we have 

Exl = M~\ 

Since we are mainly interested in correlation matrices, without loss of gen- 
erality, henceforth we will assume that 

(7] = 1, 1 < j < iV. 

Covariance matrices are ubiquitous in modern multivariate statistics where 
the advance of technology has led to a profusion of high-dimensional data 
sets. See [17-19, 24] and the references therein for motivation and appli- 
cations in a wide variety of fields. Correlation matrices are sometimes pre- 
ferred in certain statistical applications. For instance, the classic exploratory 
method Principal Component Analysis (PCA) is not invariant to change of 



''Some authors prefer to call this the standardized covariance matrix, but we chose this 
terminology from the statistical literature [17]. 
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scale in the matrix entries. Therefore, it is often recommended first to stan- 
dardize the matrix entries and then perform PCA on the resulting correlation 
matrix [17]. 

Recent progress in random matrix theory has led to a wealth of tech- 
niques for proving universality of various matrix ensembles (see [3-13, 16, 
20, 21, 26, 27] and the references therein). Here the word universality refers 
to the phenomenon that the asymptotic distributions of various functionals 
of covariance/correlation matrices (such as eigenvalues, eigenvector, etc.) are 
identical to those Gaussian covariance/correlation matrices. Thus, harness- 
ing these methods to obtain universality results in statistical problems is an 
important step, since these results let us calculate the exact asymptotic dis- 
tributions of various test statistics without having restrictive distributional 
assumptions of the matrix entries. For instance, an important consequence 
of universality is that in some cases one can perform various hypothesis tests 
under the assumption that the matrix entries are not normally distributed 
but use the same test statistic as in the Gaussian case. 

In this context, in a recent paper [24] we studied the asymptotic distribu- 
tion of the eigenvalues of the covariance matrix X^X under the assumptions 
of (1.1) and (1.2). In [24], we proved that the Stieltjes transform of the em- 
pirical eigenvalue distribution of the sample covariance matrix is given by the 
Marcenko-Pastur law [22] uniformly up to the edges of the spectrum with 
an error of order {Nr])~^, where t] is the imaginary part of the spectral pa- 
rameter in the Stieltjes transform. From this strong local Marcenko-Pastur 
law, we derived the following results: (1) rigidity of eigenvalues (2) delocal- 
ization of eigenvectors (3) universality of eigenvalues in the bulk and (4) 
universality of eigenvalues at the edges. Furthermore, in our proof of edge 
universality of eigenvalues for covariance matrices (see Theorem 7.5 of [24]), 
we gave a sufficient criterion for checking whether two matrices of form Q^Q 
{Q is a data matrix) have the same asymptotic eigenvalue distribution at the 
edge (see Section 3 for details). Here Q^Q could be quite general, including 
covariance and correlation matrices. 

Verifying the above criteria for correlation matrices is much more compli- 
cated, owing to the fact that even if it has the same form X'^ X as above, the 
matrix entries X are not independent. Fortunately in [24], as a byproduct, 
we also proved the strong Marcenko-Pastur law, the rigidity of eigenvalues 
and delocalization of eigenvectors of correlation matrices (see Lemma 2.3 
in Section 2 below or Theorem 1.5 of [24]). In this paper, we complete the 
research program initiated in [24] by proving the edge universality of cor- 
relation matrices. There are not many papers which study the asymptotics 
of the correlation matrices as compared to the relatively large literature on 
covariance matrices. The asymptotic distribution of the largest (appropri- 
ately rescaled) eigenvalue of the Gaussian correlation matrix was only very 
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recently established by [1]. As will be explained below, we also obtain this 
result as a special case of our main result and, more importantly, we do not 
need this result in our proof (see Remark 1.3). The almost sure convergence 
of the largest and smallest eigenvalues of the correlation matrix was estab- 
lished in [15]. The very recent paper [1], relying on our results in [24], shows 
that the asymptotic distribution of the largest or smallest eigenvalue of the 
correlation matrix is given by the Tracy- Widom law, under the assumption 
that the data matrix X satisfies (1.1) and its entries have symmetric distri- 
butions. In particular, the authors in [1] use the above mentioned sufficiency 
criteria for edge universality developed in [24]. Furthermore, the assumption 
that the matrix entries are symmetric is very restrictive and not natural in 
statistical applications. In this paper we will build on our previous work [24] 
and prove edge universality of correlation matrices just under the assump- 
tions (1.1) and (1.2). Furthermore, we believe that all of our main results 
should hold if one replaces the subexponential tail decay of the matrix en- 
tries by a uniform bound on the pth. moment {p > 4) of the matrix entries 
(e.g., p = 13 will suffice), as proved in [3] for Wigner matrices. 

The central ideas in this paper are based on the general machinery for 
proving universality established in a series of recent papers [3-13, 20, 21], 
where the authors Yau, Erdos et al. study the distribution of eigenvalues and 
eigenvectors by studying the Green's functions (resolvent) of the random 
matrices. 

The proof of this paper is based on the comparison of Green's functions 
first initiated in [12], but, as mentioned earlier, the key obstacle to be sur- 
mounted is the strong dependence of the entries of the correlation matrix. 
We achieve this via a novel argument which involves comparing the mo- 
ments of the product of the entries of the standardized data matrix to those 
of the raw data matrix (see Section 3 for a summary of the key ideas). Our 
proof strategy may be extended for proving the edge universality of other 
random matrix ensembles with dependent entries and hence is of indepen- 
dent interest. Furthermore, it will be interesting to see if bulk universality 
of correlation matrices can be established using the methods developed in 
this paper. 

Let us state the main result now. We denote Aj, 1 < i < N , as the eigen- 
values of X^X and Aq, = for min{A^, M} + l<a< max{iV, M}. We order 
them as 

Al > A2 > • • • > XmiLx{M,N} > 0. 

Analogously, let Aq denote the eigenvalues values of the matrix X^X. 

The following is the main result of this paper. It shows that the largest 
and smallest k eigenvalues of the correlation matrix, after appropriate cen- 
tering and rescaling, converge in distribution to those of the corresponding 
covariance matrix. 
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Theorem 1.1 (Edge universality). Let X and X, respectively, denote 
the correlation and covariance matrix as defined in (l.l)-(l.^). For any 
fixed A; G N, there exists e > and 5 > such that for any {si,S2, • • • , s^} G M 
(which may depend on N ), there exists Nq^'H independent of si, S2, ■ ■ ■ , Sk 
such that for all N > Nq, we have 



F{N^/^{Xi - A+) < si - N'\ iV2/3(Afc - A+) < Sk - iV"") - N'^ 
< P(iV2/3(Ai - A+) < si, . . .,N^/^{\k - A+) < Sk) 



An analogous result holds for the k smallest eigenvalues. 

In [14, 23] and [25], Peche, Soshnikov and Sodin proved that for some co- 
variance matrices (including the Wishart matrix) , the largest and smallest k 
eigenvalues after appropriate centering and rescaling converge in distribution 
to the Tracy-Widom law"^ whose density is a smooth function. Combining 
with our recent result on the universality of covariance matrices in [24] , we 
have the following immediate corollary for Theorem 1.1: 

Corollary 1.2. Let X denote the correlation matrix as defined in (1.1)- 
(1.4)- For any fixed k> 0, we have 



where TWi denotes the Tracy-Widom distribution. An analogous statement 
holds for the k-smallest (nontrivial) eigenvalues. 

Remark 1.3. Thus, as a special case, we also obtain the TW law for 
the Gaussian correlation matrices. 

Although the current paper builds on our recent work [24] , it is mostly self- 
contained and for the reader's convenience, we will recall all of the needed 
results from [24]. The rest of the paper is organized as follows. In Sec- 
tion 2, after establishing some notation, we give the key results establishing 
the strong Marcenko-Pastur law and rigidity of eigenvalues for correlation 
matrices, as obtained from [24]. In Section 3 we give a brief proof sketch 



(1.5) 



< P(iV2/3(Ai - A+) < si + N-', N^'\\k - A+) <sk + N~') 





(\/iV \/M) ( 1/ ViV 1/\/M) 
-^TWi, 



^Here we use the term Tracy-Widom law as in [25]. 
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illustrating the key ideas. In Section 4 we give the proof of the main results 
and in Section 5 we prove some technical lemmas which constitute the key 
ingredients in the proof of the main result. For the rest of the paper the 
letter C will denote a generic constant whose value might change from one 
line to the next, but will be independent of everything else. The notation 
OeiN") will be used to denote 0(A^"+<^^). 

2. Preliminaries. We will adopt the notation used in this paper from [24] . 
Define the Green function of X^X by 



(2.1) g»j(^)= l^ ^tx-z j..' ^ = E + iv, Eem,v>0. 

The Stieltjes transform of the empirical eigenvalue distribution of X^X is 
given by 

(2.2) m(z):= — yG,-,(z) = — Tr— . 

j 

Recall that d = N/M from (1.3) and define 

(2.3) X±:={l±Vdf. 

The Marcenko-Pastur (henceforth abbreviated by MP) law is given by 



/O.N f \ / [(A+ -x)(x- A,)]+ 

(2-4) ^^(^) = ^V • 

We define mwiz), z G C, as the Stieltjes transform of gw, that is, 

(2.5) mw[z)= / rdx. 



{x — z) 

The function mw depends on d and has the closed form solution 



l-d-z + i^[z - A-)(A+ - z) 

(2.6) mw{z) = — , 

where ^ denotes the square root on a complex plane whose branch cut is 
the negative real line. We also define the classical location of the eigenvalues 
with pvK as follows: 

/•A_(- r+oo 

(2.7) / gw{x)dx= / Qw{x)dx = j/N. 



Define the parameter 

(2.8) ^:=(logiV)'°sl°s^. 



Definition 2.1 (High probability events). Let ( > 0. We say that an 
event holds with Q-high probability if there exists a constant C > such 
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that 
(2.9) 

for large enough N. 



P(i7^) < Af^exp(-v9^) 



Let us first give the following large deviation lemma for independent ran- 
dom variables (see [12], Appendix B for a proof). 

Lemma 2.2 (Large deviation lemma). Suppose, for 1 < i < M , Ui are 
independent, mean complex variables, with E|ajp = 0"^ and have a subex- 
ponential decay as in (1-2). Then there exists a constant p = p{tl)) > 1 such 
that, for any C > and for any Ai G C and Bij G C, the bounds 

M 

(2.10) J];aiAi<(logM)''?aP||, 



i=l 



(2.11) 



(2.12) 



M 



i=l 



1/2 



1/2 



hold with Q-high probability. 

It can be easily seen that for any fixed j < N, the random variables de- 
fined by Oj = Xij, 1 <i < M, satisfy the large deviation bounds (2.10), (2.11) 
and (2.12), for any Ai^C and Bij e C and C > 0. 

Thus, the main result of [24] (see Theorem 1.5 of [24]) is applicable for 
the correlation matrix X, yielding the following strong local MP law and 
rigidity of eigenvalues: 

Lemma 2.3 (Strong local Marcenko-Pastur law and rigidity of the eigen- 
values of the correlation matrix). Let X = [xij] be the correlation matrix 
given by (1-4)- Then for any C > there exists a constant such that the 
following events hold with C,-high probability. 

(i) The Stieltjes transform of the empirical eigenvalue distribution of 
X^X satisfies 



(2.13) 



J-1 



Pi Umiz)-mw{z)\<^'^'^- 
zemS{C() ^ 

where mS{C(^) defined as the set 

i5(C7^) :={zeC: ld>i(A_/5) <E< 5A+, (^^^iV"^ < ?? < 10(1 + d)}. 
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(ii) The individual matrix elements of the Green function satisfy 



(2.14) ^ - ^ (V + ]^ j }• 

(iii) The smallest nonzero and largest eigenvalues of X'^X satisfy 

(2.15) A_ - Af"2/3 Cc < ^-j^ < max A,- < A+ + iV^/V^'^- 

j<min{A/,Ar} j 

(iv) Rigidity of the eigenvalues: recall jj in (2.7). For any l<j< 
min{M, N}, let j = min{min{A^, M} + 1 — Then 

(2.16) \Xj - 7j| < f^^N^^/'^r^/^. 

We conclude this section with the following theorem quoted from [24] (see 
Theorem 1.7 in [24]) on edge universality of covariance matrices, which is also 
needed for our proof of the edge universahty of the correlation matrix. Define 
two independent matrices X'^ = [xJj],X'^ = [xjj] with the entries xjj,x'^ 
satisfying (1.1) and (1.2) and the entries xjpxjj are mutually independent. 
Henceforth, we will write E^,P"^ (E^,P^) to indicate that the expectation 

and probability are computed for the ensemble X^ , (X^). 

Theorem 2.4 (Universality of extreme eigenvalues of covariance matri- 
ces). There exists e > and 5 > such that for any s G M (which may de- 
pend on N) there exists Nq G N independent of s such that for all N > Nq, 
we have 

pv(Ar2/3(3;v _x^)<s- N~^) - 

(2.17) <P^(iV2/3(A]^_A+)<s) 

< P^(iv2/3(A| _ A+) < s + iV~^) + 
An analogous result holds for the smallest eigenvalues A^j^^j^j- and A^jj^^j^^ ^| . 

As remarked in [24], Theorem 2.4 can be extended to finite correlation 
functions of extreme eigenvalues as follows: 

pv(^2/3(3;v _ < _ j^-e^ ^ _ . , A^2/3(3;v _ < _ ^-e^ _ ^-5 

< P-(iV2/3(3;w _ A+) < Si, . . . , iV2/3(A- - A+) < Sk) 

(2-18) 

< F^iN^/\XX - A+) <si + N~', N^/''{Xl - A+) <Sk + iV"") 

for all k fixed and sufficiently large N. We remark that edge universality 
is usually formulated in terms of joint distributions of edge eigenvalues as 
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in (2.18) with fixed parameters si,S2,--- etc. However, we note that The- 
orem 2.4 holds uniformly in these parameters, and thus they may depend 
on A^. 

3. Key ideas and proof sketch. Our basic strategy is the so-called "Green 
function comparison" method initiated in a recent series of papers includ- 
ing [11-13] for proving universality for (generalized) Wigner matrices. The 
Green function comparison method has subsequently been applied to prov- 
ing the spectral universality of adjacency matrices of random graphs [3, 4], 
the universality of eigenvectors of Wigner matrices [20], as well as the the 
spectrum of additive finite-rank deformations of Wigner matrices and the 
isotropic local semicircle law [21]. 

In this paper, we will show that (2.17) and (2.18) still hold with 
and replaced^ by the correlation matrix X and the corresponding co- 
variance matrix X, that is. Theorem 1.1. To show this result, we introduce 
a sufficient criteria for (2.17) and (2.18) derived in [24] (see Theorem 7.5 
of [24]). 

Consider two matrix ensembles X^,X^ (could be covariance, correla- 
tion or more general matrix^) and let their respective Green functions and 
empirical Stieltjes transforms [see (2.1) and (2.2)] be denoted by G^,G^ 
and , rn" . To prove that the asymptotic distribution of the extreme eigen- 
values of the matrix ensembles are identical in the sense of (2.17) 
and (2.18), it suffices to show the following [24]: 

(i) The matrices X^jX"" satisfy the strong Marcenko-Pastur law and 
the rigidity of eigenvalues as given in Lemma 2.3. 

(ii) The difference of the expectation of smooth functionals of the corre- 
sponding Green functions {G^,G^ and rri^^rn^) evaluated at the spectral 
edge must vanish asymptotically. More precisely, as pointed out in [24], it 
suffices to establish Theorems 3.1 and 3.2 below for the matrices 

Theorem 3.1 (Green function comparison theorem on the edge). Let 

i*" : M — 7- M he a function whose derivatives satisfy 

(3.1) uiai^\F^°'\x)\{\x\ + l)-^^ <Gi, a = 1,2,3,4, 

X 

for some constant Ci > 0. Then there exist > 0, A'^o e N and 5 > depend- 
ing only on Ci such that for any e < Eq, N > Nq and real numbers E, Ei 
and E2 satisfying 

1^- A+l <iV~2/3+e, |Ei - A+l < A^~2/^+^ |£;2- A+l <iV"2/3+e 



^Notice that throughout the paper we use X for the correlation matrix and X for 
the covariance matrix. This is the only instance we denote a generic matrix by X for 
compactness of notation. 
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andr]Q = N ^/'^ ^, we have 

(3.2) |E^F(A^?7o9m^(z)) - E^F(7V77o9m^(z))| < CN~^+^^, 
and 



z = E + ir]o, 



(3.3) 



E^F N 



E2 



dy'^rn'{y + ir]Q) ] -E^F(iV / dy Qm^ (y + ir]o) 

Jei 



E2 



for some constant C . 

Theorem 3.2. Fix any /c G N+ and let F : M*-' — ;> M 6e a smooth, hounded 
function with hounded derivatives. Then there exist £o > 0, A^'o G N and 5 > 
such that for any e < Eq, N > Nq and sequence of real numbers Ek < • • • < 
El < Eo with \Ej - A+l < Af^^/s+e^ j = 0,l,...,k and rjQ = iV~2/3-e^ 
have 



E 



^FIN / dyQm^{y + irio),...,N / dy Qrn^ {y + irjo] 
V Jei JEk 



(3.4) 



E^F(m^ ^ m" 



where the second term in the left-hand side above is obtained by changing 
the arguments of F in the first term from to and keeping all the 
other parameters fixed. 



Remark 3.3. Theorems 3.1 and 3.2 yield the edge universahty of the 
/c-point correlation functions at the edge for A: = 1 and k>l, respectively. 

Thus, to complete the proof of Theorem 1.1, by the Green function com- 
parison method it suffices to show (i) and (ii) above for 

= X, = X, 

where X^X denotes the correlation matrix and X^X is the corresponding 
covariance matrix. Here condition (i) is guaranteed by Theorem 2.3. 

Verifying condition (ii) entails the heart of this paper. In previous works 
mentioned earher, the authors use a Lindeberg replacement strategy, as in [2, 
27] . These proofs proceed via showing that the distribution of some smooth 
functional of the Green function (e.g., Ga, m and (xi,Gxi)) of the two 
matrix ensembles is identical asymptotically provided that the first two (in 
some cases up to four) moments of all matrix elements of these two ensembles 
are identical. For instance, if one needs to show the edge universality of two 
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covariance matrices X"^ and X^, the basic strategy is to express 

MN 

(3.5) EF(G^) - EF(G^) = ^EF(G^) - EF(G^_i), 

7=1 

where F is a smooth function and denotes the Green function of the 
ensemble (with Xq = X^) which is obtained from by replacing the 

distribution of the zjth entry of X^-i [ij] with [ij] [here J = i+ij — ^)M] 
so that Xmn = X^ . The next step is to obtain an estimate 

(3.6) EF{Gj)-EF{Gj_i)=o{N~'^) 

for each of the N'^ terms in the sum (3.5). Usually (3.6) is obtained by 
resolvent expansions, perturbation theory and the fact that X^ and X-y-i 
differ by a single entry and the first few moments of these two distributions 
are the same. 

But clearly the above method does not work in our case, since the entries 
within the same column are not independent and, therefore, one cannot re- 
place the distribution of a single entry of a column without changing the 
distribution of all the other M — 1 entries. To circumvent this, in [24] a new 
telescoping argument consisting of 0{N) ensembles was used for the com- 
parison of Green functions. The idea is that instead of replacing entries one 
at a time, one can replace the entries of the data matrix column by column 
and thus require only 0{N) ensembles. This argument from [24] is adapted 
here along with new insights for dealing with nonindependence of the entries 
and is outlined below. ^ 

Now we set X^ = X,X^ = X. For 1 < 7 < iV, let X^ denote the random 
matrix whose jth column is the same as that of X^ if j > 7 and that of X^ 
otherwise. In particular, we can choose Xq = X^ = X and X^ = X"" = X, 
where X is correlation matrix and X the corresponding covariance matrix 
of X. As before, we define 

m^{z) = ^TrG^{z), G^{z) = {X\X^ - z)-\ 

so that we have telescoping sum 

E'^FiNTjoQm^iz)) -E^F{Nr]o'im^{z)) 

= ^EF(Xryo9m7(^)) - EF{Nr]o^m-y-i{z)). 
7=1 

Clearly, (3.2) will follow from (3.7) and the following estimate: 
(3.8) \EF{Nr]o'^m^{z)) - EF(iV7?o9m^_i(z))| < Oe{N~^-^) 
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for some 6 > 0. Our strategy to obtain (3.8) is the following. First notice 
that 

EF(iVr/o9m^(z)) - EF(7V7?o9m^_i(z)) 

= EF(r/o9TYG^(z)) -EF(77o9T>G^_i(z)). 

Let be the M x (iV - 1) matrix obtained by removing the 7th column 
of X^, which has the same distribution of the M x (A^ — 1) matrix obtained 
by removing the 7th column of Define 

(3.9) G(^) = ((X(t))^(xW)-z)^\ /x = r/o9TYG(T) 

In Lemma 4.1 we will establish (3.8) by showing that 

(EF(7?o9TrG^) - EF(^)) - (EF(7?o?>Tr G^_i) - EF{fi)) 

(3.10) 

Once (3.8) is verified, the main result follows by virtue of Theorems 3.1 
and 3.2 as mentioned in the beginning of this section. Notice that since the 
columns of the data matrix X^ , are assumed to be independent, fi is 
independent of the 7th column of X^ , X^ or, equivalently, the 7th column 

of Xry^ X^j—I- 

Thus, it boils down to establishing (3.10) in the case Xq = X^ = X and 
Xjv = X^ = X. Our proof relies on the key observation that even if the en- 
tries of the 7th column vector are not independent, the difference between 
the moments of the entries of the standardized vector and its unnormal- 
ized counterpart x^ is at least an order of magnitude smaller than those 
of x^. For instance, since Xj^ = 

0(iV~V2) fQj. 1 < 2 < M, for two indepen- 
dent ensembles of covariance matrices X"^ and X"" satisfying (1.1) and (1.2), 
we have the bound 

(3.11) E(5r/-E(5-)3 = 0(iV-3/2). 

On the other hand, if x^ is the unnormalized counterpart of x^, as shown 
in Lemma 5.5, 

(3.12) E(Xi^)3-E(xi^)3 = 0(Ar-5/2)_ 

The above observation combined with a resolvent expansion — detailed in 
Lemmas 4.3, 5.4 and 5.5 — gives (3.10). 

4. Proof of the main result. In this section we will prove (3.10) in the 
case Xq = X^ = X and Xn = X"^ = X. As discussed above, it implies (3.2) 
in Theorem 3.1. Similarly, one can prove (3.3) and (3.4) in Theorems 3.1 
and 3.2, which complete the proof of Theorem 1.1, the main result of this 
paper. 

It is easy to see that (3.10) is a direct consequence of the following lemma. 
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Lemma 4.1. Let X be a M x N random matrix whose columns satisfy 
the large deviation bounds (2.10), (2.11) and (2.12), for any ^4^ G C and 
Bij G C and for any ^ > 0. The columns of X are assumed to be mutually 
independent. Furthermore, assume that the first column is given by 

(4.1) Xa = jp^, l<i<M, 

\\^l\\2 

where xn are i.i.d. random variables with mean zero and variance Af-i and 
have an exponentially decay in the tails as given by (1.2). 

Let X be the random matrix whose entries have the same distribution 
as X except for the first column, and the first column of X is given by 

Xii = Xii, 

where xn are as in (4-1)- The columns of X are also assumed to be mutually 
independent. Let m,m denote the empirical Stieltjes transforms of X^X, 
X^X. 

Then for any function F satisfying (3.1), there exists 5 > 0, eo > de- 
pending only on Ci such that for any e < Eq and for any real number E 
satisfying 

(4.2) \E-\+\<N-^/^+^, r/o = iV"^/^"^ 
we have 

(4.3) \&F{Ni]o'^m{z))-¥.F{N7]o'^m{z))\<Oe{N~^-^), z = E + ir]Q. 

Note: In this lemma X and X are neither pure correlation nor pure covari- 
ance matrices, but their respective first columns are distributed according 
to the standardized data matrix and raw data matrix. 

Remark 4.2. Under condition (4.2) (see [24]), we have the bound 

(4.4) C"^ <\mw{z)\<C, '^mw{z) = Oe{N^^/'^), z = E + ir]o. 

First we collect some properties on submatrices of a generic M x N ma- 
trix Q which can be proved using standard results from linear algebra. 
Let Q^^^ be the M x (N — 1) matrix obtained by removing the first col- 
umn of Q. Define 

(4.5) Gg) = ((Q«)t(Q«) - z)-\ = ((Q«)(Q(i))t - z)-\ 

Then by definition, G^^ is a (iV - 1) x (iV - 1) matrix, G^q^ is a M x M 
matrix and we have the identity 

(4.6) TVG«M-TVe«(.) = ^^^f±i, 
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Using the Cauchy interlacing theorem (see Equation (8.5) of [10]), it can be 
shown that 

TVGg)(z)-TrGQ(z) = 0(r?-i), 



(4.7) 



r] = ^z. 



Proof of Lemma 4.1. First we note that from Theorem 1.5 of [24], 
the conclusions of Theorem 2.3 hold for both X and X. 

Let X^^^ be the M x (A^ — 1) matrix obtained by removing the first column 
of X. Define 

(4.8) G« = ((XW)^(X(I)) - zy\ a« = ((X«)(X«)t - z)'' 
and as in (3.9) set 

(4.9) /x = r/o9TrGW -9^. 

z 

We will first verify that 

EF(?/o9TrG) -EF(/i) 

(4.10) = EF(i) ifi) (9yi + 92/2 + %3) + lEF^') (/i) (i (^yi)' + 9yi9j/2) 

+ EF(3)(^)(i(9yi)3)+o,(iV-4/3)^ 

where F^^^ denotes the sth derivative of F and y^s are defined as 

(4.11) Vk := r]ozmw{-B)''-\^i, (G^'^f^i), 
where xi denotes the first column of X . Define the quantity 



(4.12) 



B 



-zmw 



yzmwiz) 

First, recall the following identity (see (6.23) of [24]): 

(xi,xWG(i)G(i)X«txi) 



TtG- TrG^ + z-^ = {Gu + z'^) + 



(4.13) 



-z-z(xi,g(i)(z)xi) 



= zGii(xi,(gW)2(z)xi: 

Furthermore, as proved in Lemma 2.5 of [24], 

1 



(4.14) 



Gu{z) 



(xi,gW(z)xi 



-z - z(xi,^(i)(z)xi) 

-1 



that is 



zGniz) 

From (4.12) and (4.14) we obtain that 



1. 



B 



-zmw 



zmw{z) 



mw - Gu 
Gu 



EDGE UNIVERSALITY OF CORRELATION MATRICES 15 

Fix C > 0. From (2.14), Remark 4.2 and the bound |Gii| < \mw\ + 0(1), it 
follows that for z = E + irjo , 

(4.15) |i?l = '"'^"^"' <0.(jV'V3)<<i 

|<Jii| 

with C-high probability (see Definition 2.1). Therefore, with ("-high proba- 
bility, we have the identity 

(4.16) Gu = ^^=m^Y.(-B)'- 

Define y to be the l.h.s. of (4.13) multiplied by r/o, that is, 

y = 7?o(TrG-TrG(i)+z"^), 
so that using (4.13) and (4.16), we obtain 

oo 

y = rjozGu (xi , (^^^^ )^x) = ^ yfc. 

k=l 

Since xi satisfies (2.10), (2.11) and (2.12), and Q^^^ is independent of xi, 
using Lemma 2.2, we infer that for some O^ > 

(4.17) |(xi, (a«) V)| < ^ TV(g«)2 + ^^TT\gWf 
with (^-high probability. Using its definition, we bound Tv{Q^^^)'^ as 



,2 



TrgW 



|Tr(gW)^| <TV|g(i) 
(4.18) 

= 0,(iV^/3) + ^^ = 0,(ivV3), 

% 

where for the last two inequalities we have used (4.6), (4.7), (2.13) and (4.4). 
Similarly, we bound the last term of (4.17) with 

(4.19) TrigWl"^ < ryo^Tr|g(i)|2 < Oe{N^^^) 
and obtain that 

|(xi,(gW)V)|<0,(iVi/3)_ 

Equation (4.15) and the fact \z\ + |m>i/(-z)| = 0(1) yields that 

(4.20) < 0,(iV-'=/3) and \y\ < OeiN-^^"^) 

holds with C-high probability. Consequently, using (3.1) and (4.13), we see 
that the expansion 

^ 1 

(4.21) F(%9TrG) - F(/.) = J] -F^''\N7jo'^m^'Hz)){<^y)'' + Oe{N~'/^) 

k=l 
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holds with (^-high probabihty. From the bounds on j/^'s obtained above, 
equation (4.10) follows. 

Now we estimate G, which is defined as 

G = {X^X-z)-^. 

Let X^^^ be the M x {N — 1) matrix obtained by removing the first col- 
umn of X and xi denote its first column. Proceeding as in the previous 
calculations, 

EF(?7o9TrG) -EF(/i) 

(4.22) =EF«(/i)(9yi + + '^m) +^F^^H^^){U'^ylf + '^Vi'^m) 

+ EF(3)(^)(i(9yi)3)+o,(iV-4/3)^ 

where 



B = —zniw 



(5i,g«(z)5ci)-f -l) 

\zmw{z) J 



Notice that /i appears in (4.22) because the entries of X^^^ and X^^^ are 
assumed to be identically distributed. 
Define the matrices 

(4.23) y = (g(i))2, Z = gW. 

The symmetric matrices Y and Z are independent of xi and xi. Clearly, 
YZ = ZY. Therefore, using the fact that z, mw ~ 1, we can write 

yk = m Cfc,„(xi,yxi)(xi,Zxi)"', 

0<n<fc 

where Ck,n = 0(1). Let 3^ = (xi,yxi) and Z = (xi,Zxi). Then (4.10) can 
be written as 



EF(7?o9TrG) -EF(^) 

^ 0<n</c<3 ^ 

(4.24) + EF(2)(;,),^2 Q(9(Ci,o3^))2 + 9(Ci,o3^)9(C2,o3^) 
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Define y = (xi,yxi) and Z = (xi,Zxi). Using (4.22) and proceeding sim- 
ilarly as before, we obtain that (4.24) also holds for the case when G, y 
and Z are replaced with G, y and Z, respectively. The following is the key 
technical lemma of this paper whose proof is deferred to the next section. 

Lemma 4.3. Let t-R be a function satisfying 

(4.25) max|/(x)|(|x| + l)^^<C 

X 

for some constant C. Let A be of the form 

a b 

(4.26) r,^lli^,Y,^)ll{^,Z,^), 

i=i j=i 

where Yi = Y or Y* and Zj = Z or Z* with Y,Z as defined in (4-23) and 
a, b are integers with l<a<3, l<a + 6<3. Then, under the assumptions 
of Lemma 4-1, we have 

(4.27) - E(/(/i)l)| < 0,(iV-^/6), 
where A is obtained by replacing x with x in (4-26). 

Taking the difference of (4.24) and the equation obtained by replac- 
ing (4.24) with G, y and Z^ we deduce that the difference 

EF(r?o?>TrG) - EF(??o9TrG) 
can be approximated by the sum of 0(1) number of terms of the form 

E(/(/i)^) -E(/(/i)^), where A is as in (4.26) and / is equal to 

and F^^\ Therefore, by applying Lemma 4.3, we conclude that Lemma 4.1 

holds with any (5 < 1/6 and the proof is finished. □ 

Finally, we are ready to give the proof of the main result of this paper: 

Proof of Theorem 1.1. By the Green function comparison theorem 
discussed in Section 3, it only remains to prove that Theorems 3.1 and 3.2 
hold for the case 

= X, = X. 

For simplicity, we will only prove (3.2) of Theorem 3.1; the rest can be proved 
using almost identical arguments. 

For 1 < 7 < A'^, let X^ denote the random matrix whose jth column is 
the same as that of X^ if i > 7 and that of X"" otherwise; in particular, 
X{) = X"^ and X^ = X^. As before, we define 

m^(z) = 4 TyG^{z), G^{z) = {X\X^ - z)-^ . 
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N 



= ^EF{Nr]o'^m^{z)) -KF{Nr]o'^m.^^i{z)). 



7=1 



Applying Lemma 4.1 on and gives the estimate 

(4.29) \EF{Nr]o'^m^{z)) - EF{Nrio'^mj-i{z))\ < OeiN'^-^) 

for some 6 > 0. Now (3.2) follows from (4.28) and (4.29) and the proof is 
finished. □ 

5. Moment computations. In this section we prove Lemma 4.3. For no- 
tational convenience, let us denote x = xi,x = xi. We will also write 

x{k) = Xki, Sc{k)=Xki, l<k<M. 

Recall fi from (4.9). For the rest of this section, a,b will denote two integers 



l<a<3, l<a + 6<3. 
Before stating the key results of this section, let us first give some definitions. 

Definition 5.1 [X(A,k)]. For any partition A of the set {l,2,...,2a + 
26}, and a vector k = {ki,k2, ■ ■ ■ , k2a+2b},ki G {1, 2, . . . , M}, define the bi- 
nary function X(^,k) as follows. The function X(^,k) is equal to 1 if (1) 
for any i,j in the same block of A we have ki = kj, (2) if i,j are in different 
blocks of A, we have ki / kj; otherwise X(yl,k) = 0. 

Example 5.2. If 



and a + 6 = 3, then 

X(A,k) = l{k2 = k4)l{k3 = k5 = ke)l{ki / k2)lik2 / k3)l{ki / /cg). 

Definition 5.3 [Af{A,l),M{A,2) and I(a,3)]- Given a partition A of 
the set {1, 2, . . . , 2a + 2b}, let Af{A, 1) be the number of the blocks in A that 
contain only one element of the set {1, 2, . . . , 2a + 26}. Let J\f{A, 2) be the 
number of the blocks in A of the form {k2i-i,k2i} with i > a. Note that 
M{A, 2) depends on a and 6 in addition to A. Let I(yi,3) be equal to one if 
and only if a + 6 = 3 and A is composed of 2 blocks with three elements in 
each block. 

The proof of Lemma 4.3 relies on Lemmas 5.4 and 5.5 stated below and 
proved at the end of this section. 



with 



(5.1) 



A = {{1}, {2,4}, {3,5,6}} 
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Lemma 5.4. Recall the matrices Y,Z from (4-23). Then for any e > 
the following estimate 

M 

X(^,k)r7Q(Yfc-^fc2 • • • Yk^^_-^k2a)i^k2a+lk2a + 2 ' ' ' ^ k2a + 2b~lk2a + 2b) 

kl,k2,...,k2a+2b = ^ 

holds with C,-high probability for any fixed (" > 0. The result also holds if any 
of the Y,Z are replaced by their complex conjugates Y*,Z* , respectively. 

Lemma 5.5. Let yi be i.i.d. random variables such that 

Ey, = 0, E(yi)^ = M-\ 1 < i < M, 

and have a subexponential decay as in (1.2). Let A be a partition of the set 
{1,2,..., 2a + 26} and let 

Vi 



Then for any vector k = (fci, k2, ■ ■ ■ , k2a+2b) CLnd for any e > 0, we have 

(2a+2b \ / 2a+2b \ 

x{A,k) n vkA -E(x(^,k) n vkA 

With the above two lemmas in hand, we are now ready to give the proof 
of Lemma 4.3. 

Proof of Lemma 4.3. We will only prove the case when 
(5.3) Yi = Y, Zi = Z 

for all i and, thus, 

^ = r?g(x,yx)'^(x,Zx)^ 
The other cases can be proved similarly. First, let us write (4.26) as 
r?o"(x,yx)'^(x,Zx)^ 

M 2a+2b 

= E E %"^(Ak) n ^ik^){Yk,k2■■■Yk2^_,k2J 

A ki,k2,...,k2a+2b = ^ *=1 

^ i^k2a+lk2a+2 ' ' ' -^^20+26- 1 ^=20+26 ) ' 
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where the summation index A ranges over ah the partitions of the set 
{1, 2, . . . , 2a + 26}. Taking expectations, and using the fact that x is in- 
dependent oiY,Z and fi, leads to 

AI / 



2a+26 

X 

1=1 



n ^i'^i)0^klk2---Yk2a-lk2a) 

lk2a+2 ' ' ' ^k2a+2b-lk2a+2b) 



(5.4) 



(2a+2b 
i=l 

/ M 



X 



E/(/x) E • • • Yk2.-,k2.) 

\ fci,fc2,...,fc2a+26 = l 

X {^k2a+\k2a+2 ' ' ' '^A;2a+2b- 1 ^=20+26 )^ ' 

where the last inequality follows from the fact that (EX(74, k) J^^"^^'' x(A;j)) 
is independent of Y,Z. Combining (5.4), Lemmas 5.4 and 5.5, we deduce 
that 

nf{M-nf{p)A)\ 

(5.5) < ^(9^((-^-l/3-)i+!)^^l/2-jA/'(A,l)+I(A,3)^^-iynax{A/-(A,l),l}^^l/3^A/'(A,2),j 

A 

<^(9^((^-a/3^(^l/2^AA(A4)+I(A,3)(^^-l-)max{A/-(A,l),l}^^l/3-jA/-(A,2)-&^^ 
A 

Now we claim that the terms in the r.h.s. of (5.5) are bounded by Oe(iV~'^/^). 
Indeed, note that M{A, 1) > implies I(a,3) = 0. Therefore, the worse case 
scenario is the case in which 

a = l, b = N{A,2) and 7^(^,1) = !, 

since by definition we have N{A, 2) < h. But it is easy to see the above 
scenario cannot occur, since if the first two conditions hold, then it follows 
that N{A^ 1) = or 2. Thus, we have finished the proof of Lemma 4.3. □ 

Proof of Lemma 5.4. Note that all of the bounds in this lemma hold 
with (^-high probability, not in expectation. For simplicity, we will subsume 
this in the notation. 
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_ First let us prove a slightly difFererit result. Define the binary function 
X(^,k) [similar to k)] as follows. X(A,k) is equal to 1 in the following 
scenarios: (1) for any i,j in the same block of A we have ki = kj, (2) if 
i,j are in different blocks of A, we have ki ^ kj except that if one of the 
indices i,j is in the block of A which contains exactly two elements, then ki 
is allowed to be equal to kj. In all other instances I{A, k) = 0. For instance, 
in the previous example (5.1), we have 

i{A,-k) = l{k2 = k4)l{k3 = k5 = ke)l{ki ^ ks). 
We first claim that 

M 

X(A,k)ryo(lfcifc2 • ■ ■Yk2a^lk2a)i^k2a + lk2a + 2 ' ' ' ^fe2o + 26- 1 o + 26 ) 

fci,fc2,.--,fc2a + 26 = l 

(5 6) 

Let us first prove (5.6) when 'i(A,3) = 0- Define the functions 



5i(m) :=Tr|Z'"|, g2im) := ^J(T^\Z^, l<m<2a + b. 
We will show that the 

(5.7) l.h.s. i5.6)<oJvoiN'/Y^^-'^U9o^.im)\ 

i 

where ai G {1, 2} and < 2a + b. 

To this end, we will use the following 2-1-3 rule: 

• 2: If the index i appears in a block of A which contains exactly two 
elements, first sum up over the index ki. Then estimate the remaining 
terms with absolute sum. For example, let A = {{1}, {2, 3}, {4}}. Recall 
that y = Z2, 

iiA,k)Yk,kAk, = Y.(yz)m <Y^\{Yzu=Y,\iz')ki\- 

ki,k2,ks kj^l kl kl 

1: Next do the summation over the index ki if i appears in the block of A 
which contains only one element as follows: 



^|(Z-),J < CivVy(|Z|2-),„ Y\{Zn,i\ < CNy^\ 



1 2m, 

I kl 

In the above inequalities, we have used the Cauchy-Schwarz and the fact 
that Z is a symmetric matrix. Note that each summation of the above 
kind brings an extra N^^"^ factor. 

3: Finally, sum up over the other indices. After the first two steps, (5.6) 
will be reduced to the product of following terms: 

(^i/2)A/-(Ai)^ |TrZn, Vt^IW^, r<2a + h, 
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and terms of the form 

m n 

(5.8) Eni(^"')/^-^inV^^^' 2<m + n. 

k i=l j=l 

If m + n = 2, then using the Cauchy-Schwarz inequahty, (5.8) can be 
estimated as 

m n m n 

(5-9) n n El v^^^ ^ n n ^mw^ ^/t^. 

i=lj=l k i=lj=l 

For m + n > 2, we bound m + n — 2 of them [| (Z™* )fcfc | or {\Z\'^"-i)kk] by 
the maximum as fohows: 

\{Z^%,\<ma^\{Z^%,\ < VT^IW^, 

^{\Z?^^)kk < max^(|Z|2«.)fcfc < ^TVlZp"., 

to reduce to the case of m + n = 2 and use the bound (5.9). 

Let us give an example in the case a = 1, 6 = 2 and A = {{!}, {2, 3}, {4, 5, 6}}. 
Then the term (5.6) in this case reduces to 

E 'no^HkiZkikiZkiki < '^vo\{z^)kikJ\Zk4k4\, 

kik2k4 kiki 

where the above inequahty is obtained by applying rule 2. Next, applying 
rule 1 yields 

<T.'JoN'^'\/^\^\'')k,kjZk,k,\ 
and, finally, applying rule 3 leads to the bound 

Using this 2-1-3 rule described above, we obtain (5.7). By the definition of 
the 2-1-3 rule, it is easy to see that 

(5.10) ^mi = 2a + b. 

i 

Recall r/o = A^^^/^^^ Using (4.18) and (4.19), we deduce that if airrii / 1, 
then 

5a.(m.)<0,(iV2-»/3). 
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(5.12) 



For airui = 1, using (4.6), (4.7), (2.13) and mw = 0(1), we see that gi{l) = 
Oe{N). Thus, 

(5.11) ga,{mi) < 0^^(Ar2'n,/3)(^i/3)iKm,=i)^ 

Combining equations (5.7)-(5.11), we have the 
l.h.s. of equation (5.6) 

= Oe(iV^/2)^^^'-^^iV2''/^+2^/^(iV^/^)*^*-"^™^"-^^ 

Now notice that by the definition, the term gi{l) in (5.7) can only be created 
during the first step of the 2-1-3 rule, that is, the 2 rule, and, therefore, we 
deduce that 

M{A,2) = #{i:aimi = l}, 

which completes the proof of the claim made in (5.6) for the case I(yi,3) = 0- 
Now consider the case I(yi,3) = 1- Using the fact that Y,Z are symmetric 
matrices and the relation Y = Z^, we deduce that the term 

^^X(j4, k)(Yfc-^fc2 • • • ^fc2a-lfc2a)(-^fc2a+lfc2a+2 ' ' ' ■^^20+26-1^20+26) 

reduces to one of the following situations: 

X(^,k)(Yfc^fc2 ■ ■ ''^k2a-lk2a)i^k2a+lk2a+2 ' ' ' "^^20+26- 1 ^20+26 ) 

ki,k2,...,k2a+2b 

M 

^fcifcl^fclfc2^fc2fe2' 

fcl,fc2 = l 

M 

Eymi ym2 yms 
^fclfc2^felfc2^fclfc2' 

^ fcl,fc2 = l 

for rui £ {l,2},i G {1,2,3}. We bound the first scenario above as 



(5.13) 



E(^"'')k,kSZ"^')k.,^iZ"''] 



Ik2k2 



k\k2 



(5.14) 



kk\ 



kik2 



<T.\(^"'')k,kAz"^')k.k2\VTAW^- 

kik2 

Using rule 1 and rule 3 above yields 

J]|(z™^),^,^(z™^),^,j < cn'I^./tW^^^Mz\^' 

k\k2 
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and, thus, 

M 
ki,k2=l 

— ^J\l-'ia/3+l/2 -J ^^2/3(mi +1712+1713)-^ 

where in the last inequahty we have used the fact that mj = 2a + b. For 
the second case in (5.13), first we note 



max|(Z™),J<yT^^. 
Now using the Cauchy-Schwarz inequahty, 

I (z-^ ),^,^ (^™'^ (^"^ I < VtW^^MW^ VtW^ 

ki,k2 

and, thus, 

M 

fcl,fc2 = l 

= 0^(A^~2a/3-jQ^|-jY2/3(mi+m2+m3)-j 

Summarizing the above computations, and noticing that M{A, 1) = 
M^A, 2) = when I(a,3) = 1, we obtain the bound 

a + lk2a + 2 ' ' ' Z k2a + 2b-lk2a + 2b)\ 

proving the claim (5.6) when 1(a,3) = 1- 

Now we return to prove Lemma 5.4. One can see that for any partition A of 
the set {1, 2, . . . , 2a + 26} and a vector k, the function I{A, k) can be written 
as linear combinations of the functions X(Aj,k) for some partitions Ai's of 
the set {1, 2, . . . , 2a + 26} such that 

M{A„l)<Af{A,l), M{Ai,2)<Ar{A,2), Ia„3 = I(A3)- 

For instance, for A given in (5.1), 

i{A,k) = i{k2 = = h = h)iiki ^ h), 
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we have the identity 

X(^,k) =i{A,k) -X(^i,k) -X(^2,k), 

where Ai = {{1}, {2, 3, 4, 5, 6}} and A2 = {{1, 2, 4}, {3, 5, 6}}. Now the lemma 
fohows from (5.6) and the proof is finished. □ 

Proof of Lemma 5.5. For any ki,k2, ■ ■ ■ ,km & {1,2, . . . , M} and m G N, 
by definition we have 



m{A,k)l[yk,=EI{A,k) 



(5.15) 



i=l 



nm ~ 

(E,-y?r/' 



:EX(^,k)nyfc. 



i=l 



M , 



-m/2 



Using large deviation bounds, it is easy to see that for any e > 

M 

(5.16) 



7=1 ^ ^ 



Therefore, by the Taylor expansion, 

2a+2fe 2a+26 

EX(Ak) n yfc,-EX(^,k) J] yfc^ 

(5.17) 

/2a+26 \ / 

x(Ak) n J 



n=l 



A/ " / 1 

E n ^ 



^ri,r2,.--,'''n=l j=l 



M 



where Cn = Ca,b,n is a combinatorial factor. Using (5.16), the r.h.s. of equa- 
tion (5.17) may be expressed as 



no 



(5.18) 



n=l 



(2a+26 ^ 
n 
i=l / 



' M n , 

^ri,r2,...,r„=l j=l 



~2 

M 



+ 0.((iV 



-l/2N2a+26+no 



for some fixed no € N (say, no = 20). 

Since no,a,6 = 0(l), the combinatorial factors do not increase with A^, 
that is, Cn = 0(l)j and, thus, we can bound 



(5.19) 



E 



(2a+2fe \ / " / 1 \ ' 
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as follows. Notice that the number of distinct indices ki in (5.19) is equal to 
the number of blocks in the partition A. Thus, for a given set of values for 
the indices ri,r2, . . . the term (5.19) is nonzero only if at least M{A, 1) 
of the indices rj belong to the set {ki,k2, ■ ■ ■ , ^2a+2fe}- The above observation 
also implies that for (5.19) to be nonzero we must have 

(5.20) n>M{A,l). 

Furthermore, the indices rj which do not belong to the set {ki,k2, . . . , k2a-\-2b} 
must appear more than once since E(l/M — y1) = 0. This crucial observa- 
tion implies that, if the term (5.19) is nonzero and 

(5.21) 7V(y4,l) = thenn>2. 
Therefore, the number of nonzero terms in the sum 



(5.22) 



M 

E 

r-i,r2,...,r„ 



E 



X(^,k 




is 0((iVi/2)n-Ar(A,i))^ g^pj^ of these terms are of the size Oe{N-'^''+^y 
yielding 



M 

E 



E 



(5.23) 



/2a+2b 



x(Ak) n 

V i=l 

~{a+b)-n/2-M(A,l)/2 




Combining (5.23) with (5.20) and the observation made in (5.21), we obtain 
that 



M 



(5.24) 



/2a+26 



ri,r2,...,rn=l L \ i=l 

obtaining (5.2), and the proof is finished. □ 
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